Systematic Entomology
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Systematic Entomology's content profile, based on 11 papers previously published here. The average preprint has a 0.00% match score for this journal, so anything above that is already an above-average fit.
Dury, G. J.; Windsor, D. M.; Sharanowski, B. J.; Sekerka, L.; Bede, J. C.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThis study reconstructs the phylogeny of an expansive set of Neotropical leaf beetles in the subfamily Chrysomelinae. From 33 species in the genus Platyphora Gistel, and an additional 37 species representing 16 beetle genera, five genes, three nuclear, and two mitochondrial, were sequenced and used to obtain a well-supported molecular phylogeny using both Bayesian and Maximum Likelihood. The subtribes Chrysomelina and Doryphorina (sensu Daccordi 1982) were monophyletic, while the genus Platyphora was polyphyletic. The genus Leptinotarsa Chevrolat is confirmed to be distinct from Stilodes Chevrolat. Host plant family was recorded for both adults and larvae using direct observations where possible. Ancestral host plant use was reconstructed using Bayesian trait analyses. A complicated history of host plant switches among a restricted set of plant families is revealed: In the paraphyletic Platyphora, one clade that includes Proseicela and Leptinotarsa had two switches from Asclepiadiodeae to Solanaceae, one switch to Moraceae, and one switch to Malpighiaceae, another Platyphora clade had switches between Asteraceae and Rauvolfioideae, and from Rauvolfioideae to Asclepiadiodeae, with other members of the same clade feeding on Boraginaceae and Convolvulaceae. All species included in the clade containing Tritaenia and Stilodes fed on Malpighiaceae, and all species included in the Cosmogramma and Calligrapha clade fed on Malvaceae.
Nunez, R.; Bodenheim, A.; Alvarez, Y.; Wahlberg, N.; Espeland, M.
Show abstract
We provide the first comprehensive analysis of the origin of two enigmatic Satyrinae genera of uncertain affinities. Calisto, the only Satyrinae genus from the West Indies and endemic to these islands, has resisted numerous attempts at phylogenetic placement, regardless of the data type or methods used. Llorenteana, a monotypic genus from northwestern Mexico, has never been included in a molecular phylogenetic study, and past authors have placed it in five different genera and subtribes. We used mostly published genomic data, but also newly sequenced whole genome data from museum specimens and old DNA extracts, extracted BUSCO genes and prepared several datasets. These datasets differed in the degree of heterogeneity and saturation, the number of nucleotide positions used (all positions or only the first two), and were analyzed as nucleotides or as amino acids. We employed several methods for phylogenetic reconstruction using both partitioned and mixture models, as well as ASTRAL, and we inferred divergence times and ancestral areas of origin for Calisto and Llorenteana. The phylogenetic placement of Calisto varied among datasets when we used partitioned models and ASTRAL; however, most datasets resulted in the same relationships under mixture models. Our results suggest that Calisto is part of a clade of Old World origin that colonized the New World from north to south, thus sharing ancestry with Nearctic taxa. Llorenteana constitutes one of the earliest splits within the Euptychiina, a subtribe of Neotropical origin, but descending together with the Pronophilina from Nearctic ancestors. We propose the recognition of Erebiina stat. rev. as the only subtribe comprising the former Calistina syn. nov., Callerebiina syn. nov, Maniolina syn. nov., and Ypthimina syn. nov.
Uche Dike, R.
Show abstract
Macromiidae is a widely distributed lineage of libelluloid dragonflies with a largely allopatric genus-level distribution across the Holarctic, Afrotropical, Australasian, and Indo-Malayan regions. Previous studies involving this family have been complicated by morphological convergence and limited phylogenetic sampling. Here, we present the most densely sampled phylogenetic framework for Macromiidae to date, using Anchored Hybrid Enrichment data from 62 of the 125 described species. Our sampling represents all four genera and major geographic regions, including Libelluloid and Cordulegastrid outgroups. Maximum likelihood recovered three major lineages: Epophthalmia, Phyllomacromia, and Macromia sensu lato, with Epophthalmia strongly supported as sister to Phyllomacromia. Didymops was not recovered as monophyletic and was placed within Macromia, although deeper relationships within the Macromia complex showed some gene tree discordance. We additionally scored seven male genitalic characters and reconstructed their evolution across a dated phylogeny. We revealed that these traits varied heavily in phylogenetic signal, with some characters supporting the major clades and others showing high degree of homoplasy. Fossil-calibrated divergence time estimation placed the crown origin of Macromiidae in the late Oligocene (24 Ma), with other major intrafamilial divergences concentrated in the Miocene. Historical biogeographic reconstructions consistently supported Afrotropical origins for Phyllomacromia, Indo-Malayan centered ancestry for Epophthalmia, and a multi-region history for Macromia + Didymops spanning Indo-Malayan, Australasian, and Nearctic regions. Habitat reconstructions favored lentic ancestry for Macromiidae, and diversification rate variation was best explained by trait-independent models rather than lentic/lotic habitat association.
Crossay, T.; Polo-Marcial, M. H.; Esmaeilzadeh-Salestani, K.; de Queiroz, M. B.; de Lima, J. L. R.; Lara-Perez, L. A.; de la Fuente, J. I.; Szczecinska, S.; Wong, M.; Tedersoo, L.; Goto, B. T.; Magurno, F.
Show abstract
Diversisporales comprises species with worldwide distribution that produce glomoid, otosporoid, or tricisporoid spores. The recent reorganization of the order by Oehl et al. (2016) recognizes two families, Diversisporaceae and Corymbiglomeraceae, comprising one and five genera, respectively. Several Glomeromycotan specimens collected in northern and southeastern Mexico and in French Polynesian atolls were characterized using both morphological and molecular analyses. Phylogenetic inference revealed that they represent new members in the Diversisporales, supporting the reorganization of the genus Redeckera into three independent lineages: Albocarpum gen. nov., with A. arenaceum sp. nov., A. leptohyphum sp. nov., and A. fulvum comb. nov., Pulvinocarpum pulvinatum gen. et comb. nov., and Redeckera, which retains five species, including R. varelae sp. nov. In addition, we described Melanocarpum mexicanum gen. et sp. nov. and Diversispora papillosa sp. nov. A broader phylogeny, based on eDNA sequences and representative of Diversisporales species, including the newly described taxa, further supported the split of Redeckera and suggested three additional clades likely corresponding to a new family and two new genera, awaiting the discovery of representative morphospecies to be formally described. Using eDNA sequences metadata, the occurrences of the newly described taxa were mapped, allowing to recognize distribution patterns, mostly in the pantropical zone, distinguish widespread and rare species, and suggest possible endemisms. Finally, the coexistence of species forming large sporocarps (A. fulvum and A. leptohyphum) alongside species forming spores in loose aggregates (A. arenaceum), prompted us to propose a possible sporulation dimorphism in Albocarpum, an argument previously raised to explain the nested placement of Corymbiglomus and Paracorymbiglomus within the Redeckera clade.
Kerr, A. M.; Papeschi, S.
Show abstract
We present new distributional records of Argiope spiders in India, based on more than 10,000 digital images of the genus from the region curated by iNaturalist (www.inaturalist.org). Notable range expansions to India are documented for three species: A. chloreides Chrysanthus, 1961, A. mangal Koh, 1991, and A. sector (Forssk[a]l, 1776). Second, previously unrecorded field characters, updated distributional data, and a re-examination of published descriptions of type material, support the resurrection of A. undulata Thorell, 1887 as a valid species, long treated as a synonym of A. pulchella Thorell, 1881. Finally, we report the first in situ photographic records of live specimens of the rarely documented A. caesarea Thorell, 1897 and A. macrochoera Thorell, 1891. These varied findings for a small and conspicuous taxon highlight the value of online community-science platforms for documenting the arachnofauna of a biodiverse region, as well as illustrate the need for continued taxonomic review, even within well-known genera.
Nojiri, K.; Inoshita, K.; Sugeno, H.; Taga, T.
Show abstract
Animal naming is fundamental to scientific communication, yet it also reflects the historical and cultural contexts in which names are bestowed. Scientific names function as taxonomic labels and enduring records of human engagement with nature. Owing to this dual role, species names have recently attracted increasing attention from historical and humanities perspectives, both for their informative value and for the biases they may encode. To objectively assess these patterns at a large scale, we investigated etymological trends across Animalia using a comprehensive dataset of species names. Our analyses reveal that naming practices are shaped by a combination of historical events, taxonomic traditions, and cultural influences. Major global disturbances coincided with marked declines in species descriptions, whereas advances in biological techniques were associated with shifts in naming practices. Furthermore, etymological trends differed among phyla, indicating that taxonomic communities vary in their naming conventions. These differences suggest that taxonomists preferences, shared aesthetics, available knowledge, and cultural biases are differentially preserved in scientific names. Together, our results demonstrate that zoological nomenclature constitutes a valuable archive for understanding the historical and cultural dimensions of taxonomy.
Mays, H. L.; McKay, B. D.; Nishiumi, I.; Yao, C.; Zou, F.; Boyd, M.; DeRaad, D.; Lin, R.; Kawakami, K.; Kim, C.-H.; Kubatko, L. L.; Moyle, R.
Show abstract
Abstract/SummaryHere we untangle the systematics of the Asiatic white-eye complex (Zosterops spp.) to better understand the early stages of a recent island radiation. We adopt an integrative approach involving allelic data, genome-scale single nucleotide polymorphisms (SNPs), and museum-based morphometrics coupled with a comprehensive sampling to provide the most holistic understanding of the group to date. The island lineages of Asiatic white-eyes across Indonesia, the Philippines, East Asia, the adjacent oceanic islands of the Western Pacific underwent a deep split separating Zosterops everetti and Z. nigrorum in the Phillippines from a very rapid radiation including Z. japonicus, Z. meyeni, and Z. montanus in the Philippines, Japan, and Indonesia. Z. nigrorum catarmanensis on Camiguin South in the Philippines was found to be nested within Z. montanus and a species limit between Z. nigrorum populations on Panay and Luzon was strongly supported. Phylogenetic splits and population structure were detected within the clade containing Z. japonicus, Z. meyeni, and Z. montanus. A well-supported split separates a northern group including Northern Philippines Z. montanus subspecies, Z. meyeni, and Z. japonicus from the southerly Z. montanus taxa. This creates a paraphyletic Z. montanus. However, based on speciation rates within the broader Asiatic white-eye clade this break likely does not yet represent evolutionarily independent species lineages. Morphological evolution is taking place within the Asiatic white-eyes especially within the robust, large-billed subspecies of Z. japonicus on the oceanic islands of Japan and in the newly identified yellow-morph of Z. montanus on Camiguin South.
Serra Silva, A.; Telford, M. J.
Show abstract
The phyla making up the major animal clade of Spiralia have been clear since the advent of molecular phylogenetics; the relationships between these spiralian phyla have not. The lack of consensus over the relationships between these important animal phyla might be a clue implying their emergence in an explosive radiation. Focusing on the five largest spiralian phyla (Annelida, Brachiopoda, Mollusca, Nemertea and Platyhelminthes) and using two phylogenomic datasets, we have applied site-bootstrapping and taxon-jackknifing to explore this example of taxonomic instability. Analyses on the 105 possible rooted trees relating them showed that interphylum branches are very short. Preference for rooting Spiralia on Platyhelminthes is a long-branch artefact. Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant. We conclude that the spiralian phyla emerged in rapid succession resulting in a difficult to resolve radiation. The deep history we infer for Spiralia has wide ranging implications for our interpretation of Cambrian fossils and for the evolution of traits such as biomineralization, segmentation and larvae. Impact StatementAnalyses of two independent phylogenomic datasets suggest an explosive radiation at the origin of Spiralia, with implications for understanding the groups evolutionary history.
Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.
Show abstract
Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.
Kuo, P.-C.; Benson, R.; Field, D. J.
Show abstract
In birds, the quadrate bone serves as a hinge articulating with the lower jaw and the skull, playing an important mechanical role in the feeding apparatus. Avian cranial kinesis is dependent on the streptostylic quadrate transferring force from the adductor muscles at the back of the skull toward the beak, as part of a four-bar mechanical linkage to elevate and depress the bill. The complex morphology of the bird quadrate has led to authors adopting a range of alternative terminologies to describe the same anatomical structures and character states, impeding clarity of communication and presenting a barrier to progress in our understanding of the evolution of this important component of the avian feeding apparatus. Here, we reconcile terminological discord among previous studies on avian quadrate morphology and propose a stable nomenclature for future work. To characterise the considerable variation in quadrate form across crown bird diversity, we present an extensive anatomical atlas of the avian quadrate and summarise major patterns of quadrate morphological variation across extant avian phylogeny. In addition, we investigate macroevolutionary patterns in avian quadrate morphology, incorporating comparisons of crown birds and Late Cretaceous near-crown stem birds. We demonstrate that quadrate characters are useful for diagnosing a range of major avian subclades, and suggest that numerous distinctive features are likely to be associated with important biomechanical consequences. This investigation has implications for resolving the unsettled phylogenetic relationships of extinct bird clades such as Pelagornithidae and Gastornithiformes, as well as controversial relationships within several extant groups.
Nanjala, C.; Simpson, L.; Hu, A.-Q.; Patel, V.; Nicholls, J. A.; Bent, S. J.; Gale, S. W.; Fischer, G. A.; Goedderz, S.; Schuiteman, A.; Crayn, D.; Clements, M. A.; Nargar, K.
Show abstract
Understanding evolutionary relationships in hyperdiverse plant groups remains a major challenge in systematics. The orchid genus Bulbophyllum, the second largest genus of flowering plants, represents an exceptional example of phylogenetic and morphological complexity. Relationships, particularly within the species-rich Asian clade, have remained poorly resolved due to extensive morphological variation and limited resolution in previous phylogenetic studies. Here, we reconstructed phylogenetic relationships using 63 plastid genes from 355 specimens representing 322 species and 65 of the 97 recognised sections of Bulbophyllum. Our analyses confirmed that the genus comprises five major evolutionary lineages comprised of species predominantly from Australasia, Madagascar, Continental Africa, Neotropics, and Asia. We provide the first robust phylogenetic evidence for a dichotomous split within the Asian clade into two well-supported lineages: the Asian-Malesian clade and the Malesian-Papuasian clade, with the latter containing a strongly supported Papuasian subclade. Additionally, this study supports the monophyly of several currently recognised sections while clarifying relationships in previously problematic groups. This study provides the most comprehensive plastid-based phylogenomic framework for Bulbophyllum to date and establishes a foundation for future taxonomic revision and integrative analyses of diversification and trait evolution within this hyperdiverse genus.
De Vivo, M.
Show abstract
The potential usage of genomic open data can help us to understand patterns in biodiversity. They can also be helpful for identifying morphologically similar species. An example of taxon in which this can be useful is Nematomorpha, one of the less studied animal phyla, for which data has started to be available recently and where species identification can be hard. In this study, I planned initially to evaluate the usage of mitochondrial data for population analyses using an RNA sequencing (RNA-seq) dataset labelled as belonging to Chordodes fukuii. After surprising results using extracted sequences from the barcoding gene cytochrome c oxidase subunit I (COXI), I evaluated species delimitation using a mix of a previously released double-digest restriction-site-associated DNA sequencing (ddRADseq) SRA dataset plus the RNA-seq one. PCA, R analyses through "adegenet" and ADMIXTURE confirmed the presence of two species in the RNA-seq dataset, which should be labelled as C. formosanus and C. japonensis; however, some individuals labelled as C. japonensis according to COXI clustered with C. formosanuss specimens or had some C. formosanus ancestry when more data was used, indicating potential introgression or incomplete lineage sorting. The study shows how previously released data can be used for evaluating species delimitation, potential previous demographic events and potential needs in DNA barcoding and genomics for avoiding future misidentification of morphologically similar species.
van den Burg, M. P.; Thibaudier, J.
Show abstract
Understanding behavioral differences between non-native and closely related endangered species could be important to aid conservation management. In volume 169 of Zoology, Bels et al. (2025) reported on their comparison of display-action-patterns (DAP) between native Iguana delicatissima and non-native iguanas present on islands of the Guadeloupe Archipelago in the Caribbean Lesser Antilles. Here, we address conceptual and methodological concerns about their work and reanalyze their data given our proposed corrections, primarily a literature-informed adjustment of their "species" category. We additionally utilize online videos from South American mainland I. iguana populations, from where the non-native iguanas in the Guadeloupe Archipelago originate, to better understand the different DAPs between native and non-native iguanas in the Guadeloupe Archipelago. Significant differences in DAP characteristics among "species" categories (native I. delicatissima, non-native iguanas, and hybrids) show that Bels et al. (2025) oversimplified their data analyses by merging all non-native populations into one group. This result indicates the presence of behavioral variation among subpopulations within widely hybridizing iguanid populations, which has been poorly studied. Additionally, videos from mainland populations across two major mitochondrial clades of Iguana iguana show that non-native iguanas on Guadeloupe retained DAP characteristics of those populations from which they originate. We discuss these findings in light of the proposed hypotheses put forward by Bels et al. (2025), of which two can be excluded. Overall, our reanalysis shows that studies focusing on characteristics within settings of complex hybridization in diverse species should acknowledge this complexity.
Nagel, A. A.; Landis, M. J.
Show abstract
Ancestral state reconstruction is a classical problem of broad relevance in phylogenetics. Likelihood-based methods for reconstructing ancestral states under discrete character models, such as Markov models, have proven extremely useful, but only work so long as the assumed model yields a tractable likelihood function. Unfortunately, extending a simple but tractable phylogenetic model to possess new, but biologically realistic, properties often results in an intractable likelihood, preventing its use in standard modeling tasks, including ancestral state reconstruction. The rapid advancement of deep learning offers a potential alternative to likelihood-based inference of ancestral states, particularly for models with intractable likelihoods. In this study, we modify the phylogenetic deep learning software O_SCPLOWPHYDDLEC_SCPLOW to conduct ancestral state reconstruction. We evaluate O_SCPLOWPHYDDLEC_SCPLOWs performance under various methodological and modeling conditions, while comparing to Bayesian inference when possible. For simple models and small trees, its performance resembles the performance of Bayesian inference, but worsens as tree size increases. While O_SCPLOWPHYDDLEC_SCPLOW still performs adequately for more complex models, such as speciation and extinction models, the estimates differ more from Bayesian inference in comparison with simpler models. Lastly, we use O_SCPLOWPHYDDLEC_SCPLOW to infer ancestral states for two empirical datasets, one of the ancestral ranges of a subclade of the genus Liolaemus and ancestral locations for sequences from the 2014 Sierra Leone Ebola virus disease outbreak.
Hayes, R. A.; Kern, A. D.; Ponisio, L. C.
Show abstract
Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI
Khakurel, B.; Hoehna, S.
Show abstract
AbstractThe rate of evolution of a single morphological character is not homogeneous across the phylogeny and this rate heterogeneity varies between morphological characters. However, traditional models of morphological character evolution often assume that all characters evolve according to a time-homogeneous Markov process, which applies uniformly across the entire phylogeny. While models incorporating amongcharacter rate variation alleviate the assumption of the same rate for all characters, they still fail to address lineage-specific rate variation for individual characters. The covarion model, originally developed for molecular data to model the invariability of some sites for parts of the phylogeny, provides a promising framework for addressing this issue in morphological phylogenetics. In this study, we extend the covarion model in RevBayes to morphological character evolution, which we call the covariomorph model, and apply it to a diverse range of morphological datasets. Our covariomorph model utilizes multiple rate categories derived from a discretized probability distribution, which scales rate matrices accordingly. Characters are allowed to evolve within any of these rate categories, with the possibility of switching between rate categories during the evolutionary process. We verified our implementation of the covariomorph model with the help of simulations. Additionally, we examined 164 empirical datasets, finding patterns of rate heterogeneity compatible with covarion-like dynamics in approximately half of them. Upon further examination of two focal datasets that exhibited covarion-like rate variation, we found that the covariomorph model provides a more nuanced approach to incorporate rate variation across lineages, significantly affecting the resulting tree topology and branch lengths compared to traditional models. The observed sensitivity of branch lengths to model choice underscores potential implications of this approach for divergence time estimation and evolutionary rate calculations. By accounting for lineageand character-specific rate shifts, the covariomorph model offers a robust framework to improve the accuracy of morphological phylogenetic inference.
Carmelet-Rescan, D.; Malmqvist, G.; Kumpitsch, L.; Sammarco, B.; Choo, L. Q.; Butlin, R.; Raffini, F.
Show abstract
Understanding morphological variation is crucial for the study of speciation and for conservation as it helps in assessing biodiversity and predicting responses to environmental changes. These approaches are broadly applicable but are especially valuable in marine environments, where species are often elusive, difficult to study, and face heightened threats from rapid environmental shifts. The marine snail Littorina saxatilis is notable for its extensive polymorphism in shell shape, size, and colour, with ecotypes that evolve in response to environmental forces including wave exposure and crab predation. Morphometric tools have been central to investigating the mechanisms driving this phenotypic divergence; yet, a direct comparison of their methodological efficacy is lacking. Here, we took advantage of L. saxatilis ecotypes to contrast three morphometric approaches: elliptical Fourier analysis (EFA), landmarks-based geometric morphometrics (GM), and the growth-based model implemented in the ShellShaper software (SS). We assessed their clustering power, biological interpretability, robustness to measurement error and transferability among datasets. Our findings provide insights to guide method selection in studies aimed at exploring morphological variation: EFA is better suited for high-throughput screening and describing intermediate shapes; SS offers superior clustering power with highly interpretable growth parameters; and GM is best for detailed anatomical studies but is less efficient for large datasets. We provide guidelines to align method selection with specific research goals, balancing analytical efficiency with the required morphological and biological insight. By following this framework, researchers can ensure that robust morphological analysis is achieved, which is essential not only for elucidating mechanisms of adaptation and speciation but also for effective management and conservation of marine biodiversity.
Takazawa, Y.; Takeda, A.; Hayamizu, M.; Gascuel, O.
Show abstract
Phylogenetic analyses often require the summarization of multiple trees, e.g., in Bayesian analyses to obtain the centroid of the posterior distribution of trees, or to determine the consensus of a set of bootstrap trees. The majority-rule consensus tree is the most commonly used. It is easy to compute and minimizes the sum of Robinson-Foulds (RF) distances to the input trees. In mathematical terms, the majority-rule consensus tree is the median of the input trees with respect to the RF distance. However, due to the coarse nature of RF distance, which only considers whether two branches induce exactly the same bipartition of the taxa or not, highly unresolved trees can be produced when the phylogenetic signal is low. To overcome this limitation, we propose using median trees with respect to finer-grained dissimilarity measures between trees. These measures include a quartet distance between tree topologies, and transfer distances, which quantify the similarity between bipartitions, in contrast to the 0/1 view of RF. We describe fast heuristic consensus algorithms for transfer-based tree dissimilarities, capable of efficiently processing trees with thousands of taxa. Through evaluations on simulated datasets in both Bayesian and bootstrapping maximum-likelihood frameworks, our results show that our methods improve consensus tree resolution in scenarios with low to moderate phylogenetic signal, while providing better or comparable dissimilarities to the true phylogeny. Applying our methods to Mammal phylogeny and a large HIV dataset of over nine thousand taxa confirms the improvement with real data. These results demonstrate the usefulness of our new consensus tree methods for analyzing the large datasets that are available today. Our software, PhyloCRISP, is available from https://github.com/yukiregista/PhyloCRISP.
Koshkarov, A.; Tahiri, N.
Show abstract
Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.
Milkey, A.; Lewis, P. O.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.